Fire up GraphLab create



In [2]:

    
import graphlab as gl

Load a tabular dataset

SFrame is a tabular, column-mutable dataframe object that can scale to big data. The data in SFrame is stored column-wise, and is stored on persistent storage (e.g. disk) to avoid being constrained by memory size. Each column in an SFrame is a size-immutable SArray, but SFrames are mutable in that columns can be added and subtracted with ease. An SFrame essentially acts as an ordered dict of SArrays.



In [3]:

    
sf = gl.SFrame('data/people-example.csv')









    



This non-commercial license of GraphLab Create for academic use is assigned to sudhanshu.shekhar.iitd@gmail.com and will expire on August 27, 2018.






    



[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1504722254.log






    




Finished parsing file /workspace/notebooks/data/people-example.csv






    




Parsing completed. Parsed 7 lines in 0.026414 secs.






    



------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str,str,str,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------






    




Finished parsing file /workspace/notebooks/data/people-example.csv






    




Parsing completed. Parsed 7 lines in 0.022623 secs.

SFrame basics



In [4]:

    
sf # we can view first few lines of the table









    Out[4]:





    
        First Name
        Last Name
        Country
        age
    
    
        Bob
        Smith
        United States
        24
    
    
        Alice
        Williams
        Canada
        23
    
    
        Malcolm
        Jone
        England
        22
    
    
        Felix
        Brown
        USA
        23
    
    
        Alex
        Cooper
        Poland
        23
    
    
        Tod
        Campbell
        United States
        22
    
    
        Derek
        Ward
        Switzerland
        25
    

[7 rows x 4 columns]



In [5]:

    
sf.head()









    Out[5]:





    
        First Name
        Last Name
        Country
        age
    
    
        Bob
        Smith
        United States
        24
    
    
        Alice
        Williams
        Canada
        23
    
    
        Malcolm
        Jone
        England
        22
    
    
        Felix
        Brown
        USA
        23
    
    
        Alex
        Cooper
        Poland
        23
    
    
        Tod
        Campbell
        United States
        22
    
    
        Derek
        Ward
        Switzerland
        25
    

[7 rows x 4 columns]



In [6]:

    
sf.tail()









    Out[6]:





    
        First Name
        Last Name
        Country
        age
    
    
        Bob
        Smith
        United States
        24
    
    
        Alice
        Williams
        Canada
        23
    
    
        Malcolm
        Jone
        England
        22
    
    
        Felix
        Brown
        USA
        23
    
    
        Alex
        Cooper
        Poland
        23
    
    
        Tod
        Campbell
        United States
        22
    
    
        Derek
        Ward
        Switzerland
        25
    

[7 rows x 4 columns]

Inspect Dataset



In [8]:

    
sf['Country']









    Out[8]:





dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']



In [9]:

    
sf['age'].mean()









    Out[9]:





23.142857142857146

Creating new columns



In [10]:

    
sf









    Out[10]:





    
        First Name
        Last Name
        Country
        age
    
    
        Bob
        Smith
        United States
        24
    
    
        Alice
        Williams
        Canada
        23
    
    
        Malcolm
        Jone
        England
        22
    
    
        Felix
        Brown
        USA
        23
    
    
        Alex
        Cooper
        Poland
        23
    
    
        Tod
        Campbell
        United States
        22
    
    
        Derek
        Ward
        Switzerland
        25
    

[7 rows x 4 columns]



In [11]:

    
sf['Full Name'] = sf['First Name'] + ' ' + sf['Last Name']



In [12]:

    
sf









    Out[12]:





    
        First Name
        Last Name
        Country
        age
        Full Name
    
    
        Bob
        Smith
        United States
        24
        Bob Smith
    
    
        Alice
        Williams
        Canada
        23
        Alice Williams
    
    
        Malcolm
        Jone
        England
        22
        Malcolm Jone
    
    
        Felix
        Brown
        USA
        23
        Felix Brown
    
    
        Alex
        Cooper
        Poland
        23
        Alex Cooper
    
    
        Tod
        Campbell
        United States
        22
        Tod Campbell
    
    
        Derek
        Ward
        Switzerland
        25
        Derek Ward
    

[7 rows x 5 columns]

Apply Function for Data transformation



In [13]:

    
sf['Country']









    Out[13]:





dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']



In [14]:

    
def transform_country(country):
    return 'United States' if country == 'USA' else country



In [15]:

    
transform_country('USA')









    Out[15]:





'United States'



In [16]:

    
transform_country('India')









    Out[16]:





'India'



In [17]:

    
sf['Country'] = sf['Country'].apply(transform_country)



In [18]:

    
sf









    Out[18]:





    
        First Name
        Last Name
        Country
        age
        Full Name
    
    
        Bob
        Smith
        United States
        24
        Bob Smith
    
    
        Alice
        Williams
        Canada
        23
        Alice Williams
    
    
        Malcolm
        Jone
        England
        22
        Malcolm Jone
    
    
        Felix
        Brown
        United States
        23
        Felix Brown
    
    
        Alex
        Cooper
        Poland
        23
        Alex Cooper
    
    
        Tod
        Campbell
        United States
        22
        Tod Campbell
    
    
        Derek
        Ward
        Switzerland
        25
        Derek Ward
    

[7 rows x 5 columns]

First Name	Last Name	Country	age
Bob	Smith	United States	24
Alice	Williams	Canada	23
Malcolm	Jone	England	22
Felix	Brown	USA	23
Alex	Cooper	Poland	23
Tod	Campbell	United States	22
Derek	Ward	Switzerland	25

First Name	Last Name	Country	age	Full Name
Bob	Smith	United States	24	Bob Smith
Alice	Williams	Canada	23	Alice Williams
Malcolm	Jone	England	22	Malcolm Jone
Felix	Brown	USA	23	Felix Brown
Alex	Cooper	Poland	23	Alex Cooper
Tod	Campbell	United States	22	Tod Campbell
Derek	Ward	Switzerland	25	Derek Ward